Goto

Collaborating Authors

 Vienna


Suspected Undeclared Use of Artificial Intelligence in the Academic Literature: An Analysis of the Academ-AI Dataset

arXiv.org Artificial Intelligence

Since generative artificial intelligence (AI) tools such as OpenAI's ChatGPT became widely available, researchers have used them in the writing process. The consensus of the academic publishing community is that such usage must be declared in the published article. Academ-AI documents examples of suspected undeclared AI usage in the academic literature, discernible primarily due to the appearance in research papers of idiosyncratic verbiage characteristic of large language model (LLM)-based chatbots. This analysis of the first 500 examples collected reveals that the problem is widespread, penetrating the journals and conference proceedings of highly respected publishers. Undeclared AI seems to appear in journals with higher citation metrics and higher article processing charges (APCs), precisely those outlets that should theoretically have the resources and expertise to avoid such oversights. An extremely small minority of cases are corrected post publication, and the corrections are often insufficient to rectify the problem. The 500 examples analyzed here likely represent a small fraction of the undeclared AI present in the academic literature, much of which may be undetectable. Publishers must enforce their policies against undeclared AI usage in cases that are detectable; this is the best defense currently available to the academic publishing community against the proliferation of undisclosed AI.


GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer

arXiv.org Artificial Intelligence

Named Entity Recognition (NER) is essential in various Natural Language Processing (NLP) applications. Traditional NER models are effective but limited to a set of predefined entity types. In contrast, Large Language Models (LLMs) can extract arbitrary entities through natural language instructions, offering greater flexibility. However, their size and cost, particularly for those accessed via APIs like ChatGPT, make them impractical in resource-limited scenarios. In this paper, we introduce a compact NER model trained to identify any type of entity. Leveraging a bidirectional transformer encoder, our model, GLiNER, facilitates parallel entity extraction, an advantage over the slow sequential token generation of LLMs. Through comprehensive testing, GLiNER demonstrate strong performance, outperforming both ChatGPT and fine-tuned LLMs in zero-shot evaluations on various NER benchmarks.


AI tech identifies suicide risk in military veterans before it's too late: 'Flipping the model'

FOX News

U.S. Marine Corps veteran Adam Cooper is joined by Army veteran Lowell Koppert as he nears the end of his 22-hour workout and shares his'radical' pledge to bring more awareness to the issue of veteran suicides. If you or someone you know is having thoughts of suicide, please contact the Suicide & Crisis Lifeline at 988 or 1-800-273-TALK (8255). As the mental health of U.S. military veterans remains a major concern among many people in our society, new technology could become a lifesaver. An AI platform developed by ClearForce, a tech company in Vienna, Virginia, aims to identify the risk of suicide among veterans before it's too late. Col. Michael Hudson, vice president at ClearForce, spoke to Fox News Digital in an interview to discuss his efforts on the veteran suicide initiative.


FoundationDB: A Distributed Key-Value Store

Communications of the ACM

FoundationDB is an open-source transactional key-value store created more than 10 years ago. It is one of the first systems to combine the flexibility and scalability of NoSQL architectures with the power of ACID transactions. FoundationDB adopts an unbundled architecture that decouples an in-memory transaction management system, a distributed storage system, and a built-in distributed configuration system. Each sub-system can be independently provisioned and configured to achieve scalability, high availability, and fault tolerance. FoundationDB includes a deterministic simulation framework, used to test every new feature under a myriad of possible faults. This rigorous testing makes FoundationDB extremely stable and allows developers to introduce and release new features in a rapid cadence. FoundationDB offers a minimal and carefully chosen feature set, which has enabled a range of disparate systems to be built as layers on top. FoundationDB is the underpinning of cloud infrastructure at Apple, Snowflake, and other companies, due to its consistency, robustness, and availability for storing user data, system metadata and configuration, and other critical information. Many cloud services rely on scalable, distributed storage backends for persisting application state. Such storage systems must be fault tolerant and highly available, and at the same time provide sufficiently strong semantics and flexible data models to enable rapid application development. Such services must scale to billions of users, petabytes or exabytes of stored data, and millions of requests per second. More than a decade ago, NoSQL storage systems emerged offering ease of application development, making it simple to scale and operate storage systems, offering fault-tolerance and supporting a wide range of data models (instead of the traditional rigid relational model). In order to scale, these systems sacrificed transactional semantics, and instead provided eventual consistency, forcing application developers to reason about interleavings of updates from concurrent operations. FoundationDB (FDB)3 was created in 2009 and gets its name from the focus on providing what we saw as the foundational set of building blocks required to build higher-level distributed systems.


A survey on knowledge-enhanced multimodal learning

arXiv.org Artificial Intelligence

Multimodal learning has been a field of increasing interest, aiming to combine various modalities in a single joint representation. Especially in the area of visiolinguistic (VL) learning multiple models and techniques have been developed, targeting a variety of tasks that involve images and text. VL models have reached unprecedented performances by extending the idea of Transformers, so that both modalities can learn from each other. Massive pre-training procedures enable VL models to acquire a certain level of real-world understanding, although many gaps can be identified: the limited comprehension of commonsense, factual, temporal and other everyday knowledge aspects questions the extendability of VL tasks. Knowledge graphs and other knowledge sources can fill those gaps by explicitly providing missing information, unlocking novel capabilities of VL models. In the same time, knowledge graphs enhance explainability, fairness and validity of decision making, issues of outermost importance for such complex implementations. The current survey aims to unify the fields of VL representation learning and knowledge graphs, and provides a taxonomy and analysis of knowledge-enhanced VL models.


Converting Laws to Programs

Communications of the ACM

You would think something as numerical as income tax law would be similar to mathematical logic, but it is not, Protzenko says, because it is not written with the precision and clarity that would "make it amenable to a very mathematical reading of it." For example, that law does not mention a number may need to be rounded into whole cents. "The law won't tell you what you're supposed to do with rounding numbers and that can lead to ambiguity and a lack of specification of what's supposed to happen," he says. Healthcare law is also very complex. Faisal Khan, senior legal counsel at healthcare law firm Nixon Gwilt Law in Vienna, VA, says, "Software for HIPAA compliance must incorporate algorithms that target and hit on all the top-level statutory requirements and implementing regulations.' To make that happen, Khan says, "There must be a team of compliance-related input as many of the regulations essentially function as guidelines for companies to adhere to." That means a process or ...


Top 10 Cognitive Computing Startups Situated in India in 2021

#artificialintelligence

Cognitive computing registering is the utilization of automated models to reproduce the human perspective in complex circumstances where the appropriate responses might be vague and questionable. The expression is firmly connected with IBM's intellectual PC framework, Watson. Intellectual figuring is covered with AI and includes a significant number of similar hidden advancements to control intellectual applications, including master frameworks, neural organizations, mechanical technology, and virtual reality (VR). Marlabs Inc is a digital firm that has offices in Piscataway, N.J., and Bangalore, India. Founded in 1996, the company's 1,500 employees have over two decades of experience in CRM consulting, SI and big data consulting, and SI.


These high school students are fighting for ethical AI

#artificialintelligence

It's been a busy year for Encode Justice, an international group of grassroots activists pushing for ethical uses of artificial intelligence. There have been legislators to lobby, online seminars to hold, and meetings to attend, all in hopes of educating others about the harms of facial-recognition technology. It would be a lot for any activist group to fit into the workday; most of the team behind Encode Justice have had to cram it all in around high school. That's because the group was created and is run almost entirely by high schoolers. Its founder and president, Sneha Revanur, is a 16-year-old high-school senior in San Jose, California and at least one of the members of the leadership team isn't old enough to get a driver's license.


Tech Tip: Can Artificial Intelligence Improve Your Business?

#artificialintelligence

Artificial Intelligence (AI) has been one of the leading technology topics for years now, and it's not that surprising considering how many software developers are using the concept. For years it took a lot of people to realize that algorithmic machine learning could have massive benefits for business and society; they just thought AI was for the development of cyborgs. In business today, however, AI can be found in all types of applications. Let's take a look at a couple of ways AI is currently being used in business. One of the most important uses of AI is for cybersecurity, most of which is identifying actual threats and eliminating them before they can cause any problems for a business.


Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

arXiv.org Artificial Intelligence

A challenge for named entity disambiguation (NED), the task of mapping textual mentions to entities in a knowledge base, is how to disambiguate entities that appear rarely in the training data, termed tail entities. Humans use subtle reasoning patterns based on knowledge of entity facts, relations, and types to disambiguate unfamiliar entities. Inspired by these patterns, we introduce Bootleg, a self-supervised NED system that is explicitly grounded in reasoning patterns for disambiguation. We define core reasoning patterns for disambiguation, create a learning procedure to encourage the self-supervised model to learn the patterns, and show how to use weak supervision to enhance the signals in the training data. Encoding the reasoning patterns in a simple Transformer architecture, Bootleg meets or exceeds state-of-the-art on three NED benchmarks. We further show that the learned representations from Bootleg successfully transfer to other non-disambiguation tasks that require entity-based knowledge: we set a new state-of-the-art in the popular TACRED relation extraction task by 1.0 F1 points and demonstrate up to 8% performance lift in highly optimized production search and assistant tasks at a major technology company